In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discovery problems, for example drug discovery with the help of machine learning models. ASD algorithms face the well-known exploration-exploitation dilemma. The algorithm needs to choose points that yield information to improve model estimates but it also needs to exploit the model. We rigorously formulate the problem and propose a general information-directed sampling (IDS) algorithm. We provide theoretical guarantees for the performance of IDS in linear, graph and low-rank models. The benefits of IDS are shown in both simulation experiments and real-data experiments for discovering chemical reaction conditions.
translated by 谷歌翻译
课程学习(CL)是一种常用的机器学习培训策略。但是,我们仍然缺乏对CL的利益的明确理论上了解。在本文中,我们在结构化和非结构化设置下研究了CL在Multitask线性回归问题中的好处。对于两个设置,我们使用提供最佳课程和没有Oracle的Oracle获得最小的Minimax速率,而代理必须自适应地学习良好的课程。我们的结果表明,自适应学习可能比非结构化环境中的Oracle学习更难地更难,但它仅在结构化设置中引入了一个小额外的术语。为了将理论与实践连接,我们通过比较其保证与上述最小值率的保证来提供具有最高局部预测增益的任务的受欢情的经验方法。
translated by 谷歌翻译
近年来,基于深度学习的面部检测算法取得了长足的进步。这些算法通常可以分为两类,即诸如更快的R-CNN和像Yolo这样的单阶段检测器之类的两个阶段检测器。由于准确性和速度之间的平衡更好,因此在许多应用中广泛使用了一阶段探测器。在本文中,我们提出了一个基于一阶段检测器Yolov5的实时面部检测器,名为Yolo-Facev2。我们设计一个称为RFE的接收场增强模块,以增强小面的接受场,并使用NWD损失来弥补IOU对微小物体的位置偏差的敏感性。对于面部阻塞,我们提出了一个名为Seam的注意模块,并引入了排斥损失以解决它。此外,我们使用重量函数幻灯片来解决简单和硬样品之间的不平衡,并使用有效的接收场的信息来设计锚。宽面数据集上的实验结果表明,在所有简单,中和硬子集中都可以找到我们的面部检测器及其变体的表现及其变体。源代码https://github.com/krasjet-yu/yolo-facev2
translated by 谷歌翻译
作为遗传和生理方面之间的桥梁,动物行为分析是生物学和生态学研究中最重要的主题之一。但是,识别,跟踪和记录动物行为是需要专业知识的劳动密集型作品。为了减轻注释数据的支出,研究人员转向用于自动标签算法的计算机视觉技术,因为大多数数据都是视觉记录的。在这项工作中,我们探讨了各种行为检测算法,涵盖了传统的视觉方法,统计方法和深度学习方法。这项工作的目的是对相关工作进行彻底的研究,为生物学家提供有效的动物行为检测方法。除此之外,我们还讨论了这些算法的优势和缺点,以为已经深入研究该领域的人们提供一些见解。
translated by 谷歌翻译
图神经网络(GNN)已证明其在各种应用中的表现出色。然而,其背后的工作机制仍然神秘。 GNN模型旨在学习图形结构数据的有效表示,该数据本质上与图形信号denoising(GSD)的原理相吻合。算法展开是一种“学习优化”技术的算法,由于其在构建高效和可解释的神经网络体系结构方面的前景,人们引起了人们的关注。在本文中,我们引入了基于GSD问题的截断优化算法(例如梯度下降和近端梯度下降)构建的一类展开网络。它们被证明与许多流行的GNN模型紧密相连,因为这些GNN中的正向传播实际上是为特定GSD提供服务的展开网络。此外,可以将GNN模型的训练过程视为解决了较低级别的GSD问题的双重优化问题。这种连接带来了GNN的新景,因为我们可以尝试从GSD对应物中理解它们的实际功能,并且还可以激励设计新的GNN模型。基于算法展开的观点,一种名为UGDGNN的表达模型,即展开的梯度下降GNN,进一步提出了继承具有吸引力的理论属性的。七个基准数据集上的大量数值模拟表明,UGDGNN可以比最新模型实现卓越或竞争性的性能。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译